18 research outputs found
Accelerating Nearest Neighbor Search on Manycore Systems
We develop methods for accelerating metric similarity search that are
effective on modern hardware. Our algorithms factor into easily parallelizable
components, making them simple to deploy and efficient on multicore CPUs and
GPUs. Despite the simple structure of our algorithms, their search performance
is provably sublinear in the size of the database, with a factor dependent only
on its intrinsic dimensionality. We demonstrate that our methods provide
substantial speedups on a range of datasets and hardware platforms. In
particular, we present results on a 48-core server machine, on graphics
hardware, and on a multicore desktop
Efficient Bregman Range Search
We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences.
Robust euclidean embedding
We derive a robust Euclidean embedding procedure based on semidefinite programming that may be used in place of the popular classical multidimensional scaling (cMDS) algorithm. We motivate this algorithm by arguing that cMDS is not particularly robust and has several other deficiencies. Generalpurpose semidefinite programming solvers are too memory intensive for medium to large sized applications, so we also describe a fast subgradient-based implementation of the robust algorithm. Additionally, since cMDS is often used for dimensionality reduction, we provide an in-depth look at reducing dimensionality with embedding procedures. In particular, we show that it is NP-hard to find optimal low-dimensional embeddings under a variety of cost functions. 1